{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "## 梯度下降算法"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "---"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### 介绍"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "本实验主要对梯度下降算法的基本原理进行了讲解，然后使用手写梯度下降算法解决了线性回归问题。最后对 PyTorch 中的反向传播函数进行了讲解并利用该函数简明快速的完成了损失的求导与模型的训练。"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### 知识点"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "- 线性回归\n",
    "- 梯度下降算法\n",
    "- 损失函数"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "---"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 人工梯度下降算法"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "这里我们使用梯度下降算法来对线性回归问题进行求解。"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### 线性回归问题"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "线性回归问题其实就是寻找一条合适的直线（$ y = wx$）用以表示所有的数据点，如下："
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "<img width=\"400px\" src=\"https://doc.shiyanlou.com/courses/2534/1166617/aed9a0c6f6bc1de50f057d1e59e55474-0/wm\">"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "如上图所示，线性回归问题其实就是求解上面的线性函数中最佳的 w 值。合适 $y=wx$ 函数可以表示标签 Y 和数据 X 之间的关系，进而预测新的 x 所对应的 预测值 $y\\_pre_i $，其中  $y\\_pre_i = w\\cdot x_i$ 。那么我们应当用什么来衡量最佳的 w 值呢？"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "我们一般认为预测值 $y\\_pre_i $ 与真实值 $y_i$ 的距离越小，那么该函数就越好，w 的值就越趋近于最佳。在机器学习中，这种计算距离的函数有一个名字，叫做损失函数。定义如下："
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "source": [
    "import numpy as np\r\n",
    "\r\n",
    "# 所有点的预测值和实际值的距离的平方和，再取平均值（这种距离叫做欧氏距离）。\r\n",
    "def loss(y, y_pred):\r\n",
    "    return ((y_pred - y)**2).mean()\r\n",
    "#测试代码\r\n",
    "y_pred = np.array([1,2])\r\n",
    "y = np.array([1,1])\r\n",
    "loss(y, y_pred)"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "0.5"
      ]
     },
     "metadata": {},
     "execution_count": 1
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "source": [
    "import numpy as np\r\n",
    "# 欧氏距离\r\n",
    "def loss(y,y_pred):\r\n",
    "    return ((y_pred - y)**2).mean()\r\n",
    "y_pred = np.array([1,2])\r\n",
    "y = np.array([1,1])\r\n",
    "print(loss(y_pred,y))"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "0.5\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "综上，线性回归问题其实就是求解损失函数最小的情况下的 w 值。"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### 梯度下降算法"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "梯度下降算法是一种用于求解函数极小值的方法。"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "我们可以把梯度下降算法类比为一个下山的过程。假设一个人被困在山上，需要快速从山上下来，走到山底。 但是，由于山中浓雾很大，导致可视度很低。因此，我们无法确定下山的路径，只能看到周围的一些信息。也就是说我们需要走一步看一步再走一步。此时，我们就可以使用到梯度下降的算法。如下："
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "<img  width=\"400px\" src=\"https://doc.shiyanlou.com/courses/2316/1166617/60b99fc78e60a08f4eb94f814ccd90d7-0/wm\" >"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "我们需要找的损失函数也可以看做一座山，我们的目标就是找到这座山的最小值，即山底。每走一步，我们都会重新找山脚的方向。因为沿着山脚方向走能够使我们最快到达山脚的位置。"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "由于梯度表示的是函数上升最快的方向，因此梯度的反方向也应该是函数下降最快的方向。我们每次到了一个新的位置，就会就计算该位置的梯度，找到下一步下山最快的方向。"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "根据梯度和当前位置更新下一次所在位置的数学表达式如下："
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "$$\\theta^1=\\theta^0-\\alpha  \\cdot \\triangledown J(\\theta^0)$$"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "上面式子展示了损失函数 $J(\\theta)$ 的最小值的求解过程。"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "其中 $\\theta^0$ 表示当前所在位置，$\\theta^1$ 表示下一步的位置，$\\alpha$ 表示步长（即一次更新的距离）,$-\\triangledown J(\\theta^0)$ 表示损失函数的梯度的相反方向。"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "我们可以将损失函数值 J 定义为欧氏距离，如下："
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "$$J = \\frac{1}{N}\\sum_{i=1}^N(w\\cdot x_i - y_i)^2$$"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "损失函数关于 w 的梯度为（此时我们需要求的是 w 的最佳值， w 为变量，因此求损失关于 w 的梯度）："
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "$$ \\frac{\\partial{J}}{\\partial w} = \\frac{1}{N}\\sum_{i=1}^N(2wx_{i}^2-2x_{i}y_{i}) $$"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "根据上面的梯度公式，让我们来定义损失函数的梯度计算公式："
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "source": [
    "#返回dJ/dw\r\n",
    "def gradient(x, y, w):\r\n",
    "    return np.mean(2*w*x*x-2*x*y)\r\n",
    "## 测试代码\r\n",
    "x = np.array([1,2])\r\n",
    "y = np.array([1,1])\r\n",
    "gradient(x, y, 2)"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "7.0"
      ]
     },
     "metadata": {},
     "execution_count": 3
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "source": [
    "def gradient(x,y,w):\r\n",
    "    return np.mean(2*w*x*x-2*x*y)\r\n",
    "x = np.array([1,2])\r\n",
    "y = np.array([1,1])\r\n",
    "gradient(x,y,2)"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "7.0"
      ]
     },
     "metadata": {},
     "execution_count": 4
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### 人工实现梯度下降算法"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "假设 w 为损失函数需要求的变量，那么梯度下降算法的具体步骤如下："
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "1. 随机初始化一个 w 的值。\n",
    "2. 在该 w 下进行正向传播，得到所有 x 的预测值 y_pre。\n",
    "3. 通过实际的值 y 和预测值 y_pre 计算损失。\n",
    "4. 通过损失计算梯度 dw。\n",
    "5. 更新$ w：w = w-lr\\cdot dw$，其中 $lr$ 为步长，可自定义具体的值。\n",
    "6. 重复步骤 $2-5$，直到损失降到较小位置。"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "首先，让我们先来定义一下梯度下降算法所需要的数据集和变量值:"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "source": [
    "# 正向传播，计算预测值\r\n",
    "def forward(x):\r\n",
    "    return w * x\r\n",
    "# 定义数据集合和 w 的初始化\r\n",
    "X = np.array([1, 2, 3, 4], dtype=np.float32)\r\n",
    "Y = np.array([2, 4, 6, 8], dtype=np.float32)\r\n",
    "w = 0.0\r\n",
    "# 定义步长和迭代次数\r\n",
    "learning_rate = 0.01\r\n",
    "n_iters = 20"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "source": [
    "def forward(x):\r\n",
    "    return w*x\r\n",
    "X = np.array([1,2,3,4],dtype=np.float32)\r\n",
    "Y = np.array([2,4,6,8],dtype=np.float32)\r\n",
    "w = 0.0\r\n",
    "learning_rate = 0.01\r\n",
    "n_iters = 20"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "接下来，让我们根据上面步骤，利用梯度下降算法求解一元回归函数中的  w 的值："
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "source": [
    "for epoch in range(n_iters):\r\n",
    "    y_pred = forward(X)\r\n",
    "    #计算损失\r\n",
    "    l = loss(Y, y_pred)\r\n",
    "    #计算梯度\r\n",
    "    dw = gradient(X, Y, w)\r\n",
    "    #更新权重 w\r\n",
    "    w -= learning_rate * dw\r\n",
    "\r\n",
    "    if epoch % 2 == 0:\r\n",
    "        print(f'epoch {epoch+1}: w = {w:.3f}, loss = {l:.8f}')\r\n",
    "     \r\n",
    "print(f'根据训练模型预测，当 x =5 时，y 的值为： {forward(5):.3f}')"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "epoch 1: w = 0.300, loss = 30.00000000\n",
      "epoch 3: w = 0.772, loss = 15.66018677\n",
      "epoch 5: w = 1.113, loss = 8.17471600\n",
      "epoch 7: w = 1.359, loss = 4.26725292\n",
      "epoch 9: w = 1.537, loss = 2.22753215\n",
      "epoch 11: w = 1.665, loss = 1.16278565\n",
      "epoch 13: w = 1.758, loss = 0.60698175\n",
      "epoch 15: w = 1.825, loss = 0.31684822\n",
      "epoch 17: w = 1.874, loss = 0.16539653\n",
      "epoch 19: w = 1.909, loss = 0.08633806\n",
      "根据训练模型预测，当 x =5 时，y 的值为： 9.612\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "source": [
    "for epoch in range(n_iters):\r\n",
    "    # 前向传播\r\n",
    "    y_pred = forward(X)\r\n",
    "    # 计算损失\r\n",
    "    l = loss(Y,y_pred)\r\n",
    "    # 反向传播计算梯度\r\n",
    "    dw = gradient(X,Y,w)\r\n",
    "    # 更新权重\r\n",
    "    w -= learning_rate*dw\r\n",
    "\r\n",
    "    # 每2轮打印\r\n",
    "    if epoch%2==0:\r\n",
    "        print(f\"epoch {epoch} : w={w:.3f}, loss={l:.8f}\")\r\n",
    "print(f'根据模型的预测,当x=5的时候,y的值为{forward(5):.3f}')"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "epoch 0 : w=2.000, loss=0.00000000\n",
      "epoch 2 : w=2.000, loss=0.00000000\n",
      "epoch 4 : w=2.000, loss=0.00000000\n",
      "epoch 6 : w=2.000, loss=0.00000000\n",
      "epoch 8 : w=2.000, loss=0.00000000\n",
      "epoch 10 : w=2.000, loss=0.00000000\n",
      "epoch 12 : w=2.000, loss=0.00000000\n",
      "epoch 14 : w=2.000, loss=0.00000000\n",
      "epoch 16 : w=2.000, loss=0.00000000\n",
      "epoch 18 : w=2.000, loss=0.00000000\n",
      "根据模型的预测,当x=5的时候,y的值为10.000\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "从结果可以很清晰的函数，我们利用梯度下降算法，不断的降低损失的值，寻找最佳的权重 w。当损失不再发生变化时，证明我们已经找到了一个较小的值，此时的 w 就是较佳权重（根据结果可以看到，w 的值无限接近于 2）。即线性函数 $y=2x$ 可以很好的表示上面的数据集合。"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 利用 PyTorch 实现梯度下降算法"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "由于线性函数的损失函数的梯度公式很容易被推导出来，因此我们能够手动的完成梯度下降算法。但是，在很多机器学习中，模型的函数表达式是非常复杂的，这个时候手动定义该函数的梯度函数需要很强的数学功底。因此，这里我们使用上一个实验中所用的后向传播函数来实现梯度下降算法，求解最佳权重 w。 "
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "首先，让我们来定义数据集合以及 w 的初始值，并将其设置为可以求偏导的张量。"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "source": [
    "import torch\r\n",
    "X = torch.tensor([1, 2, 3, 4], dtype=torch.float32)\r\n",
    "Y = torch.tensor([2, 4, 6, 8], dtype=torch.float32)\r\n",
    "#初始化张量 w\r\n",
    "w = torch.tensor(0.0, dtype=torch.float32, requires_grad=True)\r\n",
    "# 定义步长和迭代次数\r\n",
    "learning_rate = 0.01\r\n",
    "n_iters = 20"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "source": [
    "import torch\r\n",
    "X = torch.tensor([1,2,3,4],dtype=torch.float32)\r\n",
    "Y = torch.tensor([2,4,6,8],dtype=torch.float32)\r\n",
    "# 初始化w\r\n",
    "w = torch.tensor(0.0,dtype=torch.float32,requires_grad=True)\r\n",
    "# 定义步长和迭代次数\r\n",
    "learning_rate = 0.01\r\n",
    "n_iter = 20"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "接下来让我们使用 `.backward()` 直接求解梯度:"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "source": [
    " for epoch in range(n_iters):\r\n",
    "    y_pred = forward(X)\r\n",
    "    l = loss(Y, y_pred)\r\n",
    "    # 无需定义梯度求解的函数，直接求解梯度\r\n",
    "    l.backward()\r\n",
    "    # 利用梯度下降更新参数\r\n",
    "    with torch.no_grad():\r\n",
    "        # w.grad :返回 w 的梯度\r\n",
    "        w.data -= learning_rate * w.grad\r\n",
    "    \r\n",
    "    # 清空梯度\r\n",
    "    w.grad.zero_()\r\n",
    "\r\n",
    "    if epoch % 2 == 0:\r\n",
    "        print(f'epoch {epoch+1}: w = {w.item():.3f}, loss = {l.item():.8f}')\r\n",
    "print(f'根据训练模型预测，当 x =5 时，y 的值为： {forward(5):.3f}')"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "epoch 1: w = 0.300, loss = 30.00000000\n",
      "epoch 3: w = 0.772, loss = 15.66018772\n",
      "epoch 5: w = 1.113, loss = 8.17471695\n",
      "epoch 7: w = 1.359, loss = 4.26725292\n",
      "epoch 9: w = 1.537, loss = 2.22753215\n",
      "epoch 11: w = 1.665, loss = 1.16278565\n",
      "epoch 13: w = 1.758, loss = 0.60698116\n",
      "epoch 15: w = 1.825, loss = 0.31684780\n",
      "epoch 17: w = 1.874, loss = 0.16539653\n",
      "epoch 19: w = 1.909, loss = 0.08633806\n",
      "根据训练模型预测，当 x =5 时，y 的值为： 9.612\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "source": [
    "for epoch in range(n_iters):\r\n",
    "    # 前向传播\r\n",
    "    y_pred = forward(X)\r\n",
    "    # 计算损失\r\n",
    "    l = loss(Y,y_pred)\r\n",
    "    # 反向传播\r\n",
    "    l.backward()\r\n",
    "    # 利用梯度下降更新参数\r\n",
    "    with torch.no_grad():\r\n",
    "        # 更新梯度的计算不需要被纳入到计算图中\r\n",
    "        # w.grad:就是w的梯度\r\n",
    "        w.data -= learning_rate * w.grad\r\n",
    "    # 清空梯度进行下一次\r\n",
    "    w.grad.zero_()\r\n",
    "    if epoch%2==0:\r\n",
    "        # w.item(),把w从[val]拿出来变成val\r\n",
    "        print(f'epoch {epoch+1}: w={w.item():.3f},loss = {l.item():.8f}')\r\n",
    "print(f'根据模型来进行预测 当x=5时,y的值为:{forward(5):.3f}')"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "epoch 1: w=2.000,loss = 0.00000010\n",
      "epoch 3: w=2.000,loss = 0.00000005\n",
      "epoch 5: w=2.000,loss = 0.00000003\n",
      "epoch 7: w=2.000,loss = 0.00000001\n",
      "epoch 9: w=2.000,loss = 0.00000001\n",
      "epoch 11: w=2.000,loss = 0.00000000\n",
      "epoch 13: w=2.000,loss = 0.00000000\n",
      "epoch 15: w=2.000,loss = 0.00000000\n",
      "epoch 17: w=2.000,loss = 0.00000000\n",
      "epoch 19: w=2.000,loss = 0.00000000\n",
      "根据模型来进行预测 当x=5时,y的值为:10.000\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "可以看到，利用 PyTorch 进行的梯度下降的结果和人工梯度下降结果一致。我们可以通过 PyTorch 中的 `.backward()`，简洁明了的求取任何复杂函数的梯度，大大的节约了我们公式推导的时间。"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 实验总结"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "当然，本实验只是利用 `.backward()`对损失进行了求导，其实 PyTorch 中还有很多用于梯度下降算法的工具包。我们可以使用这些工具包完成损失函数的定义、损失的求导以及权重的更新等各种操作。在下一个实验中，我们将对这些工具包进行详细的讲解。"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "<hr><div style=\"color: #999; font-size: 12px;\"><i class=\"fa fa-copyright\" aria-hidden=\"true\"> 本课程内容版权归蓝桥云课所有，禁止转载、下载及非法传播。</i></div>"
   ],
   "metadata": {}
  }
 ],
 "metadata": {
  "kernelspec": {
   "name": "python3",
   "display_name": "Python 3.8.0 64-bit ('pytorch': conda)"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.0"
  },
  "interpreter": {
   "hash": "95edf26445b41d81dc60008cc593bb3c243ca80a3a822915e2b6f7013280bc10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}